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Single unit recordings in the human medial temporal lobe (MTL) have revealed a population of 
cells with conceptually based, highly selective activity, indicating the presence of a sparse neural 
code. Building off previous work by the author and J.C. Collins, this paper develops a statistical 
model for analyzing this data, based on maximum likelihood analysis. The goal is to infer the 
underlying distribution of neural response probabilities across the population of MTL cells. The 
response probability, or neuronal sparsity, is defined as the total probability that the neuron produces 
an above-threshold firing rate during the presentation of a randomly selected stimulus. Applying 
the method, it is shown that a beta-distributed neuronal sparsity across the cells of the MTL 
is consistent with the data. The resulting hts reveal a sparse and highly skewed code, with a 
huge majority of neurons exhibiting extremely low response probabilities, and a smaller minority 
possessing considerably higher response probabilities. The distributions are closely approximated 
by a power law at low sparsity values. Strikingly similar skewed distributions have been found in 
the statistics of place cell activity in rats, suggesting similar underlying coding dynamics between 
the human MTL and the rat hippocampus. 


I. INTRODUCTION 


The sparse coding hypothesis states that neural pro¬ 
cessing of sensory information is organized to produce 
representations of salient aspects of the environment 
(people, objects, landmarks, etc.) using only a small 
number of strongly activated neurons [T]. This kind 
of representation lies between two theoretical extremes: 
dense coding, in which each stimulus is represented in 
the activation of a substantial proportion of the avail¬ 
able cells; and local coding, where each object is rep¬ 
resented by the firing of a single neuron US]. Sparse 
coding schemes possess many favorable properties includ¬ 
ing a high storage capacity, energy efficiency, and ease of 
readability (for references and further discussion, see the 
review by Olshausen and Field 0 )- 

Experimental detection of sparse codes involves identi¬ 
fying numerous cells that are highly selective, responding 
to a very small proportion of complex stimuli HIT]. Ex¬ 
amples include odor-specific Kenyon cells in locusts [5|, 
VI cells in cats and mice [T0|, and neurons in the 
temporal cortex in non-human primates p iTTHI^ . Also, 
the RA projecting neurons of the HVC in zebra finches 
exhibit extremely sparse activity during song production 
[14j . In humans, striking evidence of sparse coding has 
been observed in the medial temporal lobe in a series of 
experiments HMH] including the concept cells reported 
in Quian Quiroga et al. m which were observed to re¬ 
spond to stimuli related to a single concept (e.g. the 
“Jennifer Aniston neuron”) out of nearly 100 other con¬ 
cepts presented. 

Therefore, characterizing the sparseness of the neu¬ 
ronal representations is an important goal. Towards this 
end, one metric used is the total fraction of stimuli that 
elicit a response in a particular neuron, that we term the 
neuronal sparsity, a m- 


a = 


^ stimuli triggering a response 


( 1 ) 


This dehnition of sparsity is a property of the neuron it¬ 
self, and is the total probability that the cell responds to 
a randomly chosen stimulus. It is not necessarily equal 
to the fraction of stimuli that elicit a response during a 
particular experiment in which only a small subset of pos¬ 
sible stimuli are presented, as used in Ison et al. for 
example. If a particular cell remains unresponsive during 
the presentation of 100 randomly selected images, then 
its sparsity is not necessarily 0, it may instead have a very 
low but non-zero sparsity. In other words, a is a quan¬ 
tity that must be inferred from a particular experiment 
rather than directly calculated. 

This approach treats neuronal activity as binary 
“active-vs-inactive” over a certain post-stimulus time 
window. This is appropriate for cells that exhibit highly 
elevated firing rates under specihc circumstances com¬ 
pared with the baseline rate da ED, and few in-between 
cases. Place cells, for example, display high firing rates 
when the organism is at a particular location in the en¬ 
vironment and much lower firing rates otherwise [221128] . 

The principal goal of this paper is to extend the model 
developed by the author and Collins dS] for fitting the 
human MTL data presented in Mormann et al. [21| in 
order to estimate the distribution of sparsity across the 
population of neurons. In that work, the cells were split 
into two discrete populations, each with a characteristic 
sparsity value. In this paper, the model is extended to 
include continuous sparsity distributions. 

Specifically, the neuronal sparsity in the MTL is as¬ 
sumed to follow a beta distribution. Motivation for 
choosing the beta distribution comes from its common 
usage as a distribution of probabilities, giving it wide 
application in Bayesian statistics [2^. The PDE with 
parameters a and b is given by 


JJa.fc (rr) — 


B (a, b) 


6-1 
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where the normalization factor B (a, b) is the beta func- 


total # of possible stimuli 
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tion. This produces an overdispersed model for the neu¬ 
ral responses, predicting a heavier tail than if all neurons 
had the same sparsity. Such overdispersed, or skewed 
models, are suspected to play a prevalent role in many 
aspects of neural systems [lillT], and such distributions 
have been found in place cell activity of the rat CAl [28] , 

Following the fitting procedure developed by the au¬ 
thor and Collins in [19j . we perform inferential statistics, 
treating both the neurons and the presented stimuli as 
random samples from their respective “universe”. This 
way, we can infer the response properties of all neurons 
in a brain region, rather than merely describing the data, 
as is done in previous analyses [5D] . The data reported in 
Mormann, et al. [24] is fit to the beta model using max¬ 
imum likelihood analysis, yielding the best estimates of 
the beta distribution parameters, a and b. Then, the 
goodness of fit is assessed using analysis. 

The model is shown to produce acceptable fits in all 
four subdivisions of the MTL for which data is available: 
the hippocampus (Hipp), the entorhinal cortex (EH), 
the amygdala (Amy), and the parahippocampal cortex 
(PHC). The estimated beta distributions for the popula¬ 
tion of human MTL neurons indicate: 

• The neuronal sparsity is highly non-uniform, with 
the least selective 5% of neurons possessing a spar¬ 
sity over 0.01. 

• Da^b (a) nearly has a power-law divergence with ex¬ 
ponent « — 1 at low sparsity. 

• The mean sparsity of MTL neurons is very low, on 
the order of 10“^. 

This model marks an improvement over the model previ¬ 
ously developed in m, where the neurons were assumed 
to be split into two populations, each characterized by 
a sparsity value. This model was able to produce good 
fits in the Hipp and EC, but it failed to fit the data from 
Amy and PHC. Furthermore, the beta model has only 
two fitting parameters, while the two-population model 
has three. 

Finally, the skewed, highly non-uniform beta-binomial 
distributions of responses in the human MTL are com¬ 
pared with strikingly similar results recently observed in 
the statistics of place cell activity of the rat hippocam¬ 
pus reported in Rich et al. |55|. Specihcally, they observe 
that the number of place fields recruited per cell follows 
a skewed gamma-poisson distribution. Gamma-poisson 
distributions are simply limiting cases of beta-binomial 
distributions (shown in Appendix [b| . This strongly sug¬ 
gests that place cell codes and concept cell codes are two 
different manifestations of the same underlying dynam¬ 
ics, in line with previous suspicions [29] . 

II. BETA-BINOMIAL MODEL 

We define the sparsity, a, of a particular binary neuron 
according to Eq. Q. In the experiment we analyze, a 


neuron is considered responsive to a particular stimulus 
if its firing rate exceeds the baseline rate by a statisti¬ 
cal threshold during an appropriate post-stimulus time 
frame (see [21] for details). 

We define a stimulus as an image of a individual object, 
person, building, etc. as used in [ 21 ]. The set of stimuli 
presented to the patients in |24j constitute a tiny random 
sample from the universe of all stimuli. In this section, 
we extend our previous procedure to include continuous 
distributions of sparsity, Dg (a), with fitting parameters 
0. In particular, we postulate the sparseness of a ran¬ 
domly selected neuron from the MTL is sampled from a 
beta distribution with pdf given by equation ([^. 

Examining the data presented in Mormann et al. [24j . 
we note that there are a large proportion of unresponsive 
cells, and that a significant proportion of responsive cells 
respond to only one image. Hence, we would expect a 
sparsity distribution skewed towards zero sparsity. Note 
that Eq. ([^ diverges as a —>■ 0 when a < 1, which is what 
we would expect for this data. For a « 0, the distribution 
behaves like a power law as a —>■ 0, however, at a = 0 
the distribution diverges too greatly at zero sparsity to 
be normalized. 

Let S be the number of stimuli presented to the pa¬ 
tient during an experiment, and let N be the number 
of recorded neurons, each with a sparsity sampled from 
[^ The neurons are assumed statistically independent, as 
measured in m- For a particular neuron, let K equal 
the number of stimuli that were measured to evoke a re¬ 
sponse in that neuron, and let Uk for k = 1...S', be the 
number of neurons that respond to k out of the S stimuli 
presented. Then, we follow earlier work m and derive 
the likelihood function. 

For the data we analyze, we must take into consid¬ 
eration that not all units isolated by the spike sorting 
algorithm consist of a single neuron m l24] . Limita¬ 
tions in the spike sorting procedure make it so that some 
fraction of the recorded units represent the activity of 
multiple neurons. If the activity of a unit is the com¬ 
bined activity of several neurons, then on average that 
unit will respond to more stimuli over the course of an 
experiment compared with a unit consisting of a single 
neuron. Thus, we will carry out the calculation in two 
cases: for the first case we assume all units consist of a 
single neuron, and for the second case we assume some 
fraction of units, p, are comprised of two neurons while 
the rest consist of single neurons. 


A. Derivation of Beta-Binomial Response 
Probability 

In this subsection, the relevant results developed by 
the author and Collins in m are summarized, making 
appropriate modihcations for the introduction of a con¬ 
tinuous distribution of sparsity. 

During the presentation of S randomly selected stim¬ 
uli, a single pseudo-binary unit with sparsity, a, responds 
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to K = k oi the stimuli with a conditional probability 
given by the binomial distribution ([19]): 

PiK = k\a)= (3) 


If the sparsity, a is sampled from a continuous distri¬ 
bution, Dg (a) then, the total probability, (0) that a 
randomly selected unit responds to k stimuli is given by: 


P{K = k) 


[ daDg{a) 

Jo 
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= ck (0). 

Substituting Eq. Q into Eq. Q and evaluating 
integral yields the beta-binomial distribution: 

, fS\B{a + k,b + S-k) 

= -5M)-■ 


the 

( 5 ) 


Eq (§ represents the probability that a single neuron 
responds to k out of S presented stimuli. Mixing the bi¬ 
nomial response with a beta distribution over the param¬ 
eter a, produces an overdispersed distribution of neural 
responses. This allows the heavy tails present in the data 
presented in Mormann et al. [53], displayed in Table (|T|. 

This allows us to fit the outcome of an experiment in 
which S stimuli are presented to N neurons, recorded 
in parallel. The result of the experiment is the set of 
numbers, no, n-i, n 2 , ..., ns, where Uk is the number of 
recorded neurons that respond to k of the stimuli. The 
expected number of cells per bin, n^ assuming the re¬ 
sponses are sampled from Eq. (§, is given by: 

nl = Nck (a, b) ± \/Nek (a,b) (1 - et (a, b)). (6) 


To find the values of a and b that provide the best n^, we 
employ the method of maximum likelihood. This involves 
maximizing the likelihood function for the data given the 
beta model, £{a,b). The likelihood function is derived 
in|19j. and the same result applies here, giving: 

s 

C{a,b) = W{no,ni,...,ns} [cfc (a, &)]"'“• (7) 

k=0 

where the normalization constant W{no, ni,..., ns} is the 
multinomial coefficient, i.e. the number of ways of rear¬ 
ranging N objects without changing the n^ values. Since 
W{nQ, Til, ..., ns} is independent of the parameters a and 
&, it does not need to be taken into consideration during 
maximization. Maximizing C (a, b) gives the parameter 
values, oo and bo that give the best fit, n^. This was 
performed numerically using Mathematica. 

The expectation values, n^ = Nek {ao,bo), are com¬ 
pared with the data, nfc, using analysis to assess the 
goodness of fit. The statistic is defined: 


2 ., ^ (rik - Nckf 

X (fcmax;ni,n 2 , • ■ •) = 2^ 
fc=l 


Nek 


Eor a good fit, 

~ # of data points) — (# of model parameters) 

( 9 ) 

This procedure only applies for n^ larger than a few, so 
for our model, we fit only the first five data points (no, 
ni, n 2 , no, and n^) in order to include only the bins with 
significant responses. Thus, the fit is good when ~ 3. 


B. Extension of Model to Multiple-Neuron Units 

This subsection also follows the procedure developed in 
m for incorporating the effect of multiple-neuron units. 
The experimentalists estimated the fraction of multiple- 
neuron units to be p = 0.66 m- As in |19| . we make 
the simplifying assumption that all units consisting of 
multiple cells consist of two neurons. 

Let e} (a, b) be the total probability that a unit selected 
at random responds to k out of S stimuli. We follow [TS] 
to derive the unit level response, e} (a, b), in terms of the 
neuron-level response given by ek- 

Then, 


4 («:&) = (1-P)efc+Pe(2).fe (10) 

where e( 2 ),fc is the probability that a randomly chosen 
double-neuron unit responds to k stimuli, is the prob¬ 
ability of a single neuron unit responding to k stimuli 
and is given by equation ([^. 

In order to derive e( 2 ),fc; we first note that a double¬ 
neuron unit responds to a stimuli if either one of its con¬ 
stituent neurons responds. Let ai, 02 be the sparsities 
of the constituent neurons. Then, assuming the two neu¬ 
rons are independent, the unit has an effective sparsity 
given by 


Q:«mt = 1 - (1 - cii) (1 - 02 ) (II) 

Similar to the single-neuron unit, the conditional proba¬ 
bility that the double-neuron unit responds to k stimuli 
out of S, assuming the unit sparsity is known, is given 
by the binomial distribution with probability parameter 

^unit • 


P{K = k\ aumt) = ( ^ ) aLi*(I - (12) 


The total probability, e( 2 ),fe is found by integrating the 
conditional probability over the beta distribution, equa¬ 
tion ([^ for each constituent neuron: 

e( 2 ),/c = / dai / da 2 [Da,b {ai) Da,b { 02 ) x 
Jo Jo 

(13) 


( 8 ) 
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Evaluating this integral yields (derivation in appendix): 


e(2).fc 


[B{a,b)]^ 


k 

E{ 

i=o 


X B {a + j,b + S - j) 

X B{a + k — j,b + S — k)} 


(14) 


The form of the likelihood function is the same as Q 
except that Eq. (101 is used instead of Eq. (§ for the 
single unit response probability. 


C {a,b) = W{nQ,ni,...,ns}W_ [e),(a,&)]”fo (15) 


fe =0 


Maximizing (15) with respect to a and b provides the best 
fit parameters of the model Og and feg to the data. The 
model prediction, has the same form as equation (§ 
only with ej. in place of of tk- 


III. RESULTS 
A. Fits to data 

In this section, we use the beta model to analyze data 
taken from the human MTL presented in (23]. The data 
is recorded from 1194 Hipp units, 844 EC units, 947 Amy 
units, and 293 PHC units. The patients were shown on 
average 97 images of familiar stimuli, presented in ran¬ 
dom order. The for each region are given in Table |T] 
Maximizing Q with the respect to the a and b yields the 
parameter values given in Table |ll] for the four regions. 
The goodness of each ht was assessed using test. Plots 
of the fits against the data are given in hgure[^ 

All four regions are successfully fit by both the beta 
model containing single units and the improved model 
containing a mixture of single and double neuron units 
(x^ values given in table . This suggests that the data 
is consistent with the notion that the sparsity of human 
MTL neurons follows a beta distribution, given in equa¬ 
tion with a < 1 and 6 > 1. The sparsity distribution is 
far from uniform, consistent with the result in m when 
the two-population model was used. 

Including the double units in the model had little ef¬ 
fect on the goodness of fit, though the value of the a 
parameter is brought closer to zero. This results in a 
lower mean sparsity compared with assuming only single 
neuron units. 

Plots of the best-fit sparsity distributions from the 
mixture model are given in figure The means of the 
distribution in each of the regions are: (a) = 1.6 x 10“^ 
in Hipp; 1.5 x 10“^ in EC; 1.5 x 10“^ in Amy; 4.0 x 10“^ 
in PHC. In each region, a is close to zero, producing a 
near-divergence in the neuronal sparsity distribution as 
a —>■ 0. This indicates a large population of extremely 
sparse neurons, consistent with the results of |19] . For 
these fits, roughly 95% of the MTL neurons falls within 


the power-law regime, as is shown by the linear region of 
the log-log plots in figure 

The sparsity distribution in the PHC differed quantita¬ 
tively from the hts in the other three regions. Firstly, the 
model predicts that neurons of the PHC has the highest 
mean sparsity, (a) = 4 x 10“^, compared to the other 
three regions which have mean sparsities clustered near 
1.5 X 10“^. Also, the tail of the sparsity distribution ex¬ 
tends to larger k for the PHC than it does for the other 
three regions, as shown in figurej^ This is consistent with 
the observation that the selectivities of neurons tend to 
increase as information proceeds down the ventral visual 
pathway from PHC to the EC, Hipp and Amy |24) . 


B. A Model for Representational Learning by 
Preferential Attachment 

One advantage of using statistical inference to estimate 
the underlying distribution of sparsity, D (a), is that the 
form of the distribution can suggest a particular gener¬ 
ating mechanism. Beta distributions with near power 
law behavior arise as limiting distributions in numerous 
“rich-get-richer” schemes. For example, both preferential 
attachment processes on growing networks, where nodes 
with a high degree are more likely to receive edges from 
newly added nodes [301 El] > and Polya urn schemes [32] , 
where the proportion of balls of a particular color grows 
whenever a ball of that color is sampled from the urn, 
both yield beta distributions as t —>■ oo. 

In this section, we consider the bi-layered network 
model studied in Peruani et al. |33j . In their model, 
the bottom layer of the network contains Af nodes which 
remain hxed in number, while the top layer consists of t 
nodes that are added one at a time, starting from t — 0. 
As nodes are added, they connect with nodes in the bot¬ 
tom layer with a higher probability of attaching to nodes 
with a large degree. 

We interpret the fixed bottom layer as the binary neu¬ 
rons of the MTL responsible for object encoding, and we 
interpret the top layer as stimuli related to familiar con¬ 
cepts which have been previously encoded into memory. 
The addition of a node in the top layer indicates a new 
concept that is to be coded into memory, e.g. an unfamil¬ 
iar person to whom you have just been introduced. An 
edge between stimulus i in the top layer and neuron j in 
the bottom layer indicates that neuron j is part of the 
code for i (i.e. neuron j activates whenever stimulus i is 
presented to the organism). The attachment procedure 
for new nodes represents the complex neurological learn¬ 
ing processes by which new concepts are coded onto the 
neural substrate of the human MTL. Thus, the growth of 
the top layer with the attachment of edges to the bottom 
layer is a model for representational learning of a binary 
code. The sparsity of neuron j, aj, is given by 



( 16 ) 
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FIG. 1: Best fits to data recorded in four regions of the human MTL assuming all recorded units consist of a single neuron. 
The red circles indicate the experimental values from Table and blue dots connected by lines indicate the best-fit predictions 
for the expectation values of with double-neuron units included predicted by the beta-binomial model. The dotted lines 
indicate best fits for a pure binomial model i.e. assuming all cells have the same sparsity. Note the log scale on the y-axis. 
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Amy 




PHC 



FIG. 2: Sparsity distributions across neurons in each region predicted by the multi-unit model consisting of single units and 
double units. The plots are shown using log-log axes to indicate power law behavior at low sparsity. The shaded region indicates 
the upper 5% tail of the sparsity distribution, i.e. the model predicts 95% of the neurons in each region have a sparsity left of 
the shaded area. The verticle dotted lines indicate the mean sparsities in each region. 
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no 

ni 

n2 

na 

n4 

ns 

riQ 

nr 

ns 

no 

nio 

nil 

ni2 

ni3 

ni4 

Hipp 

1019 

113 

30 

17 

7 

4 

1 

2 

0 

0 

0 

0 

1 

0 

0 

EC 

761 

45 

15 

9 

4 

8 

0 

0 

1 

0 

0 

0 

0 

1 

0 

Amy 

842 

61 

17 

15 

3 

3 

1 

0 

1 

1 

0 

0 

0 

1 

2 

PHC 

244 

13 

11 

7 

3 

0 

4 

1 

2 

4 

3 

0 

1 

0 

0 


TABLE I: Number of units rife responding to k images as reported by [24] in four MTL regions. 


Single-Unit Model Multi-Unit Model 



Hipp 

EC 

Amy 

PHC 

Hipp 

EC 

Amy 

PHC 

a 

0.17 

0.08 

0.09 

0.08 

a 

0.11 

0.05 

0.05 

0.05 

b 

66 

36 

34 

12 

b 

67 

36 

34 

13 

X"(5) 

6.1 

1.9 

8.6 

3.3 

X"(5) 

2.1 

0.56 

5.2 

2.7 


TABLE II: Best fit parameters of the beta distribution a and b for the four MTL regions. Left-hand table gives values assuming 
all recorded units consist of a single neuron. Right-hand table gives values assuming a mixture of single and double neuron 
units, values evaluated for A: = 1 to fc = 10. 


where fc* is the node degree of neuron j after t stimuli 
have been added in the top layer. 

Following Peruani, et al. |33j . the network is grown by 
adding a node to the top layer at each time step, t, and 
then attaching it to fi nodes in the bottom layer. In our 
model, fi is the code word length of each stimulus and is 
assumed to be fixed. We assume ^jl « N. As each of the 
/i edges is added, the probability A (fc*) that it attaches 
to neuron j with node degree fc* is defined by 

■4«) = ^(i*;+i) (u) 

where 7 is a parameter that determines the influence of 
node degree on the attachment process and C {t) is a nor¬ 
malization constant. For 7 = 0 , the edges are attached 
at random to the neurons with A (fcj) = ^ for each j. 
In the case where fj, « Af, the probability that neuron j 
connects to the stimulus added at time t after all ^ edges 
have been connected is approximately 

m) = ilk] + 1 ) (18) 


In the case of purely random attachment (7 = 0 ), the 
distribution D (|) —>■ (5 (| — as t —?► 00 , i.e. the neu¬ 
ronal sparsity is the same for all neurons. Purely random 
attachment is inconsistent with the data fit above, as can 
be seen from the dotted lines in|T| 

If we treat the data analyzed above as a random sample 
of N neurons from the bottom layer and S stimuli from 
the top layer, then we can match the beta parameters r 
and s with the fits reported above. This gives an estimate 
of roughly 10 < 7 < 20 and ^ ^ < 5 ^. These 

values for 7 suggest from equation 1 ^ that preferential 
attachment plays a large role in assigning new stimuli 
to MTL neurons. In other words, it is indirect evidence 
that neurons are not randomly assigned to new concepts, 
but rather new concepts are more likely to be coded onto 
neurons that have previously been assigned to previously 
learned concepts. 


IV. DISCUSSION 

A. Relation to Previous Work 


When 7 = 0, a newly added stimulus attaches to neuron 
j with probability (kj) = 

The network is grown starting from t = 0, with no 
stimuli in the top layer and all node degrees in the bottom 
layer equal zero. Equation defines the attachment 
process as each stimulus is added. We are interested in 
the sparsity distribution, (f) of neurons in the bottom 
layer after a large number of stimuli have been learned. 
Peruani et al. [33] showed that for 7 > 0, in the limit of 
large t, 


D 






where r = - and s = — — -. 

7 7M 7 


(19) 


This paper builds upon previous work [191 IS] within 
which neurons were assumed to be split into two discrete 
populations: a sparse population comprising roughly 
5% — 10% of the MTL neurons, with a sparsity on the 
order of 10 “^; and an ultra-sparse population compris¬ 
ing the remaining 90% — 95%, with a sparsity on the 
order of 10“^. There, the neurons in each population 
were assumed to have the same sparsity value. This two- 
population model produced good fits in the Hipp and the 
EC, but produced poorer fits in the Amy and the PHC. 
The beta model developed in this paper is a continuous 
distribution, and it fits all four regions adequately, in¬ 
cluding the Hipp and the EC. 

Thus, there are two radically different sparsity distri- 
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butions that are consistent with the single-unit responses 
from the Hipp and EC. In order to produce good fits to 
the data, which shows very large Uq and ni bins com¬ 
pared with the bins at higher k, the sparsity distribu¬ 
tions at small a are largely determined by these two bins. 
There is insufficient statistical power to distinguish be¬ 
tween different sparsity distributions in this low sparsity 
regime. 

To illustrate the connection between the beta model 
and the two-pop model, the shaded region seen in figure 

roughly corresponds to the sparse population (pop¬ 
ulation labeled “D” in m), while the neurons in the 
unshaded region are analogous to the ultra-sparse pop¬ 
ulation (labeled “US” in US]). In the two-population 
model, the distribution would have a Dirac-delta func¬ 
tion located in the shaded region and another located in 
the unshaded region, representing the two populations. 

Consequently, the exact form of the beta distribution 
should not be taken too literally, as there are likely to be 
other continuous distributions that are consistent with 
the data. However, one would still expect similar overall 
behaviors regardless of which distribution is chosen. The 
advantage of the beta distribution is that it yields com¬ 
pact analytical results in the likelihood analysis for both 
the single unit model and the multi-unit model. 


B. Similarity to Place Cell Statistics 

Recent experiments have revealed that place cells in 
the rat hippocampus display skewed activity, in which 
the number of place fields recruited by place cells ex¬ 
hibits a heavier tail than what would be predicted if the 
cells all recruited place fields at the same rate [5S|. In 
the experiment performed by Rich et al. [28|, the re¬ 
cruitment of place fields among the cells of CAl obeys 
a skewed gamma-poisson process, in which each cell ac¬ 
quires place fields according to a poisson process, but the 
rate parameter for each cell is sampled from a gamma dis¬ 
tribution. This results in a poisson distribution of place 
fields per cell mixed by a gamma distribution. In this 
paper, the number of concpet fields of MTL neurons are 
shown above to be be distributed by a binomial distribu¬ 
tion mixed with a beta distribution. The gamma-poisson 
distribution and beta-binomial distributions are closely 
related, with the gamma-poisson being a limiting case of 
the beta-binomial. 

Place cells of the rat and concept cells in humans share 
many characteristics |29j . and observing nearly identicle 
distributions of activity in both populations suggests that 
the place cell code and the concept cell code likely arise 
from similar underlying processes. In other words, the 
results suggest that the recruitment of place fields by the 
place cells of the rat hippocampus and the recruitment of 
“concept fields” by the concept cells of the human MTL 
are two manifestations of the same neural mechanism, 
despite coding for spatial information vs conceptual in¬ 
formation. 


Furthermore, finding nearly identicle distributions in 
different species across cells with strikingly different re¬ 
ceptive fields lends more credence to the idea that skewed 
distributions in general play a fundamental role in neural 
functioning [laiiT]. 

One possible mechanism for generating skewed distri¬ 
butions, especially distributions that have that have ap¬ 
proximate power-law behavior, are the various cumula¬ 
tive advantage or “rich get richer” schemes such as pref¬ 
erential attachment processes on growing networks ex¬ 
plored above. Another possibility, briefly explored in [2Hj 
is that the non-uniform recruitment of receptive fields 
arises from intrinsic cell differences, such as non-uniform 
excitability and pre-synaptic inputs. I suggest that these 
two hypotheses are not mutually exclusive, and that both 
may play a role in generating the observed sparse, skewed 
distributions. 


C. Impact of Silent Cells 

The spike sorting techniques for single unit-recordings 
only detect a small fraction of the neurons within range 
of the electrode. The remaining neurons, constituting 
perhaps as much as 90% of the overall cells, remain com¬ 
pletely silent during the experiment and thus are missed 
completely by the spike sorting algorithm [55It38j . This 
means that the sample of neurons is biased in favor of 
more active cells. 

To clarify, these silent cells, or “neural dark matter,” 
are distinct from the no cells reported in |24] which did 
not produce above threshold firing rates for any of the 
presented stimuli. The population of n-o cells still emit¬ 
ted spikes and thus were detected by the spike sorting 
techniques used. The silent cells on the other hand, emit¬ 
ted no spikes, and thus could not be detected. In effect, 
these cells should be included in the n-o bin to give a more 
biased estimate of the sparsity distribution. The result 
on the fits would be to lower the value of a, bringing it 
even closer to zero. 

Some studies estimate the silent cell population to be 
as high as a factor of ten [5^. That is, there are perhaps 
ten silent cells for each recorded cell. Using this factor of 
cells added to the hq bin, maximum likelihood analysis 
yields a = 0.007 and & = 55 in the hippocampus, with 
= 1.2. This brings the sparsity distribution much 
closer to a power law, indicating a considerably sparser 
code. 
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Appendix A: Derivation of multi-unit response 
probability 


In this section we derive equation (14) starting from 
the integral given in equation (13): 

e( 2 ).fe = / dai / da 2 [Da,b {cti) Da,b {<^ 2 ) x 
^0 Jo 

(Al) 

where Da b (a) is the beta distribution PDF given in 
equation (j^, and aunit is the effective sparsity for the 
double unit given by equation (11). 


To evaluate the integral, we first note that: 


(1 - aunit)^ * = [(1 - ai) (1 - 02 )] 


S-k 


(A2) 


Then, we expand the factor i^i (All using the bi¬ 

nomial theorem: 

= (“1 + 02 (1 - ai))^ 

= (A3) 

i=0 

Usin g the se results and equation (§, we can write equa¬ 
tion (Al I as a sum of separable integrals 


0 < a < 1, where the probability parameter a is sam¬ 
pled from a beta distribution with parameters a > 0 and 
& > 0. The PMF for the binomial distribution and the 
PDF for the beta distributions are given respectively: 


P{K = k)= (Bl) 


and 


D(a) = 


(1-a) 


6-1 


B (a, b) 


(B2) 


respectively. 

Thus, the PMF of the beta-binomial distribution with 
parameters S', a, and b is given by: 

(S\ .1 

P{K = k)= / da (1 - af~^ 0 “"^ (1 - 

id (a, b) Jq 

fS\B{a + k,b + S-k) 

= [k)- - Bi^) - 

The gamma-poisson distribution, more commonly called 
the negative binomial distribution, is a Poisson distribu¬ 
tion with parameter A > 0 where A is sampled from a 
gamma distribution with parameters r > 0 and j3 > 0. 
The PMF of the Poisson distribution and the PDF of the 
gamma distribution is given: 


e(2),fe - 


[B(a,&)]2 \k 


El 

j=o 


X / dai a“ (1 — ai)^ ^ 


f da2 (1 - (A4) 

Jo 


The integrals can now be evaluated as beta functions 

k 

1 / ' 

£( 2 ),/= 


El 

1=0 


[B {a,b)]^ \k^ 

X B {a + j,b + S - j) 

X B {a + k — j,b + S — k) (A5) 


yielding the result given by equation (14). 


Appendix B: Gamma-poisson distribution as a 
limiting case of the beta-binomial distribution 

In this appendix, I show that the gamma-poisson dis¬ 
tribution is a limiting case of the beta-binomial distri¬ 
bution. I show it here because a convenient reference 
showing this result could not be located. 

The beta-binomial distribution is a mixture distribu¬ 
tion of a binomial distribution, with parameters S and 


P{K = k) = —^ (B4) 

and 

<?(A) = ^A^-ie-'-\ (B5) 

and the gamma-poisson distribution with parameters r 
and (3 is then given by 




fc! F(/3) 

1 /*oc) 

/ dA 

fc!r(/3)io 

I rP Ax ^ 


fc! F(/3) Jo r -(- 1 \r -1-1 
1 r{k + P) 

Mf(/3) (r + l)'^+^ 


(B6) 


To show that |B3| yields |B6| as a limiting case, we begin by 
observing that the Poisson distribution is a limiting case 
of the binomial distribution as S' —>■ 00 while holding the 
mean, (fc) = Sa constant. The Poisson parameter A is 
the expected response, i.e. A = (fc). 

Similarly, the gamma-Poisson distribution is the lim¬ 
iting case of the beta-binomial distribution as S 00 
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while ensuring the mean (fc) = S{a) for the beta-binomial 
distribution is finite. From equation (B2), we see that 


r) = 


1 


B{a, b) 


f q ;“ (1 — a)^ ^ 

Jo 


B{a + 1,5) 
B{a, b) 
a 


(B7) 


So, 


(k) = S (a) = 


Sa 


b' 


(B8) 


Using these identities, we get write equation (B3) as 


P[K = k) = 


r(5 + i) 


T{k+1)T{S-k + l) 


^ r{a + k)r{S-k + b) 

^ T{a + b + S) ^ 


r(a -I- 5) 
r(a)r(5) 


r(5 + i) 

T{k+1)T{S-k + l) 
r{a + k)r{S{l + r)-k) ^ r{a + rS) 
r(a + ^(l + r)) ^ r(a)r(r5) 

(Bll) 


One way to ensure (fc) is finite as S' —> oo, is for the 
parameter 5 to approach infinity as a linear function of 
S by setting 5 = rS. The constant r > 0 is arbitrary. It 
will be shown that that r matches the parameter of the 
gamma distribution above when the limit is taken. 

Before taking the limit by setting 5 = rS in equation 
(B31, it is useful to write the beta functions and bino¬ 
mial coefficient in terms of gamma functions using the 
identities 


and 



r(s + i) 


r{k + l)T{S-k + l) 


B (a;, y) = 


nx)ny) 

T{x + y) 


(B9) 

(BIO) 


Now, to take the limit as S —> oo, we use Stirling’s ap¬ 
proximation applied to ratios of gamma functions: 


r(x -I- 7) 
r(a: -b w) 




(B12) 


for large x. We get: 


P{K = k) 


S^ r(a-bfc) 
r(fc-bl) ^ [S(l-b r)]“+''^ 
1 r(a-bfc) r“ 
fc! ^ (1 -b r)°‘+^ ^ r(a) 


(rS)“ 

(B13) 


Comparing equations ( |B13 ) and ( |B6[ ), we see that they 
match, and that a = /3. This concludes the derivation. 
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